Skip to main content

View Post [edit]

Poster: Jeff Kaplan Date: Mar 24, 2021 7:28am
Forum: texts Subject: Re: OCR does not any longer include necessary fonts

can you post a link to a book on archive.org that you are trying to use?

Reply [edit]

Poster: archotto Date: Mar 25, 2021 3:29am
Forum: texts Subject: Re: OCR does not any longer include necessary fonts

Hi Jeff and thanks for help! One of the many links to this series of books is: https://archive.org/details/darstellungdese06schwgoog/page/n235/mode/2up?q=Erdst%C3%A4llen
I am looking for expressions like "Erdstall" of Erdställe", which definitely is written on page 227, but is not found by OCR.

Reply [edit]

Poster: Jeff Kaplan Date: Mar 25, 2021 9:10am
Forum: texts Subject: Re: OCR does not any longer include necessary fonts

i'm re-running that one. check back in a day or so.

Reply [edit]

Poster: archotto Date: Mar 26, 2021 3:43am
Forum: texts Subject: Re: OCR does not any longer include necessary fonts

Thanks a lot!!

Reply [edit]

Poster: archotto Date: Apr 7, 2021 2:44am
Forum: texts Subject: Re: OCR does not any longer include necessary fonts

Hi Jeff, this problem seems to be difficult! On the other hand it is nearly impossible to read some 36 thick volumes carefully in search of a certain word... So it would be a great help indeed if OCR works!! Or can you think of other solutions (e.g. can I download the whole PDF files and try myself?)? Best regards Otto

Reply [edit]

Poster: Jeff Kaplan Date: Apr 7, 2021 10:39am
Forum: texts Subject: Re: OCR does not any longer include necessary fonts

you can search the FULL TEXT file https://archive.org/stream/darstellungdese06schwgoog/darstellungdese06schwgoog_djvu.txt

i found one instance of that name in it:
"In neuerer Zeit ftößt man bei Bearbeitung des Feldes öfter auf ſolche Erdſtälle."

Reply [edit]

Poster: archotto Date: Apr 8, 2021 2:02am
Forum: texts Subject: Re: OCR does not any longer include necessary fonts

Thank you for the hint! The problem is, that not all volumes offer FULL TEXT. And when you e.g. compare the original text with the full text of vol. 3 in the List of Wikipedia "Schweickhardt", you will see, that the OCR, which produced the full text, did not work well at all. So search is biased as well. Of course the scan quality of certain pages is low, but over all it is a matter of OCR quality (both FULL TEXT and search). This is a real pity! If there is any solid solution, please help!

Reply [edit]

Poster: Jeff Kaplan Date: Apr 8, 2021 9:16am
Forum: texts Subject: Re: OCR does not any longer include necessary fonts

we did not scan those so nothing we can do there.

which volumes do not offer full text. please post links.

Reply [edit]

Poster: archotto Date: Apr 9, 2021 4:06am
Forum: texts Subject: Re: OCR does not any longer include necessary fonts

FullText is missing: Bd. 2: https://archive.org/details/darstellungdese25schwgoog

Bd. 4: https://books.google.at/books?id=e54r-t9LPf8C&redir_esc=y&hl=de

But please compare the original with the full text (beginning of Bd 1,) :
https://archive.org/details/darstellungdese16schwgoog/page/n13/mode/2up

9ßä>tenb einet Seit Don 10 Sa&ren , al$ ber
SJerfaffer fein 8iebtfng8fhtbium, bie öfterreiä)ifcbe
©efc&ic&te , unauegefefct eifrig betrieb , fyatte er
bie befte (Gelegenheit , alte SBerf e , bie über bie*
fed Sanb bortyanben finb , genau fennen p tcr=
nen, unb erfab bierauS, bajl (wßbrenb anbere
9>röbin$en beS öffccrrctc^ifdf)cn Äaiferjtaate8 in
neueren «Seiten SRänner gefunben , bon welchen
nfi|ftcf>e unb brauchbare' geograpbifc& 5 ftatifttfc&e
arbeiten erfebienen) ha$ SBiegenfanb beS mäfyti'
gen ©taatenförperl, ba3 @rjberjogt^um £>efter*
teiefc unter ber €it6, fein einziges SSÖerE auf jus
weifen bot/ weftbea bae 8anb im 2C Hg enteis
nett, gteicfmne burefc einzelne £)rt8befc&reis
bung/ umfaffenb (nämtfefc bon feinem @ntfle*
ben an ununterbrochen) batgefieHt fyättt ; benn
außer einigen fttetn , $um £betf fefron unbrauefc*
bar geworbenen, bloß topograpbtfc&en Söer«
fen, unb ber »erbienjiuoHen Söearbei*
t u n g ber l irä)tt#en £opograpbie / bie aber bi$
iefct nur einzelne £)ecanate balb auö biefem, halb
au8 jenem Giertet befebrieb, unb beren ©nbe

Unfortunately I cannot include a jpg (mail does not work).
As you can see this FULL TEXT does not help at all, as hardly any word is spelled correctly and therefor cannot be located by search!

Best regards
Otto This post was modified by archotto on 2021-04-09 11:06:34

Reply [edit]

Poster: Jeff Kaplan Date: Apr 9, 2021 10:12am
Forum: texts Subject: Re: OCR does not any longer include necessary fonts

i'm re-running that. we only recently added the ability to OCR fraktur fonts

Reply [edit]

Poster: archotto Date: Apr 7, 2021 2:27am
Forum: texts Subject: Re: OCR does not any longer include necessary fonts

Hi Jeff, the problem seems to be difficult. But it would be a really great help if OCR would work (it is nearly impossible to read 36 volumes carefully)!! Could you think of another solution (maybe I can download the PDF-files and try myself?)? Best regards Otto

Reply [edit]

Poster: archotto Date: Apr 7, 2021 2:28am
Forum: texts Subject: Re: OCR does not any longer include necessary fonts

Hi Jeff, the problem seems to be difficult. But it would be a really great help if OCR would work!! Could you think of another solution (maybe I can download the PDF-files and try myself?)? Best regards Otto